Step-by-Step Guide to Building a Smart Crop Selection App Using Streamlit
Agriculture plays a crucial role in feeding the world, and selecting the right crop for a given soil type can significantly improve productivity. With Machine Learning and Streamlit, we can build a Smart Crop Selection App that predicts the most suitable crop based on soil properties such as Nitrogen (N), Phosphorus (P), Potassium (K), and pH levels.
In this blog post, we will walk you through how to use Streamlit to build and deploy a Smart Crop Selection web application.
🚀 What is Streamlit?
Streamlit is an open-source Python library that makes it easy to build and share interactive web applications for data science and machine learning projects. It requires minimal code and allows quick visualization of models and data.
✅ Why Use Streamlit for Smart Crop Selection?
- Easy to use – Minimal front-end coding required.
- Interactive UI – Users can input soil data and get real-time predictions.
- Fast Deployment – Easily deployable on Streamlit Community Cloud.
- Open-source & Free – No need for expensive infrastructure.
🛠️ Step 1: Set Up Your Environment
Before building the app, install the required dependencies.
📥 Install Required Libraries
Run the following command in your terminal:
pip install streamlit pandas numpy scikit-learn joblib xgboost
This installs:
- Streamlit → To create the web app.
- Pandas & NumPy → For handling data.
- scikit-learn → For preprocessing and machine learning.
- XGBoost → The machine learning model for crop classification.
- Joblib → To save and load trained models.
📂 Step 2: Organize Your Project
Create a directory structure:
smart_crop_selection/
├── app.py # Streamlit web app
├── data/
│ ├── soil_measures.csv # Dataset
├── models/
│ ├── crop_model.pkl # Trained ML model
│ ├── label_encoder.pkl # Encode data labels
│ ├── scaler.pkl # Scale data inputs
├── scripts/
│ ├── preprocess.py # Data preprocessing functions
│ ├── train_model.py # Model training script
├── requirements.txt # List of dependencies
├── README.md # Project documentation
📊 Step 3: Prepare and Preprocess Data
We need to preprocess the soil dataset before training the model.
🔹 Create preprocess.py
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler
import joblib
def preprocess_data(df, fit_scaler=True, fit_encoder=True):
"""
Preprocesses the input dataset:
- Encodes the target variable
- Standardizes numerical features
Parameters:
df (pd.DataFrame): Input data containing soil parameters and crop labels
fit_scaler (bool): Whether to fit a new StandardScaler (True for training, False for inference)
fit_encoder (bool): Whether to fit a new LabelEncoder (True for training, False for inference)
Returns:
tuple: (Preprocessed feature matrix X, Encoded target variable y, scaler, label_encoder)
"""
# Separate features and target
X = df.drop(columns=['crop']) # Ensure 'crop' is the target variable
y = df['crop']
# Encode the target variable
label_encoder = LabelEncoder()
if fit_encoder:
y_encoded = label_encoder.fit_transform(y)
joblib.dump(label_encoder, 'models/label_encoder.pkl') # Save encoder
else:
label_encoder = joblib.load('models/label_encoder.pkl')
y_encoded = label_encoder.transform(y)
# Standardize numerical features
numerical_features = ['N', 'P', 'K', 'ph']
X[numerical_features] = X[numerical_features].astype('float64')
scaler = StandardScaler()
if fit_scaler:
X_scaled = X.copy()
X_scaled[numerical_features] = scaler.fit_transform(X[numerical_features])
joblib.dump(scaler, 'models/scaler.pkl') # Save scaler
else:
scaler = joblib.load('models/scaler.pkl')
X_scaled = X.copy()
X_scaled[numerical_features] = scaler.transform(X[numerical_features])
return X_scaled, y_encoded, scaler, label_encoder
def preprocess_input(input_df):
"""
Prepares new input data for model prediction.
Parameters:
input_df (pd.DataFrame): User input containing soil parameters
Returns:
pd.DataFrame: Scaled input ready for prediction
"""
scaler = joblib.load('models/scaler.pkl') # Load pre-trained scaler
numerical_features = ['N', 'P', 'K', 'ph']
input_df[numerical_features] = input_df[numerical_features].astype('float64')
input_df[numerical_features] = scaler.transform(input_df[numerical_features])
return input_df
🤖 Step 4: Train the Machine Learning Model
We use XGBoost to predict the best crop based on soil parameters.
🔹 Create train_model.py
import pandas as pd
import joblib
import xgboost as xgb
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import accuracy_score
from .preprocess import preprocess_data
# Load dataset
df = pd.read_csv('data/soil_measures.csv')
# Preprocess data
X_scaled, y_encoded, scaler, label_encoder = preprocess_data(df, fit_scaler=True, fit_encoder=True)
# Split the data with stratification
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_encoded, test_size=0.2, stratify=y_encoded, random_state=42)
# Define parameter grids for XGBoost
xgb_params = {
'max_depth': [10, 20],
'learning_rate': [0.001, 0.1, 1],
'n_estimators': [100, 200, 500],
'subsample': [0.7, 0.9]
}
# Perform hyperparameter tuning using cross-validation
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
# XGBoost tuning
xgb_model = xgb.XGBClassifier(eval_metric='mlogloss', verbosity=1)
xgb_grid = GridSearchCV(estimator=xgb_model, param_grid=xgb_params, cv=kf, scoring='accuracy', n_jobs=1, verbose=1)
xgb_grid.fit(X_train, y_train)
# Best model from GridSearchCV
best_xgb_model = xgb_grid.best_estimator_
# Evaluate model
y_pred = best_xgb_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Best XGBoost Model Accuracy: {accuracy:.4f}')
# Save the best model
joblib.dump(best_xgb_model, 'models/crop_model.pkl')
🖥️ Step 5: Build the Streamlit Web App
🔹 Create app.py
import streamlit as st
import pandas as pd
import numpy as np
import joblib
from scripts.preprocess import preprocess_input
from sklearn.metrics import accuracy_score
# Load trained model and label encoder
crop_model = joblib.load('models/crop_model.pkl')
label_encoder = joblib.load('models/label_encoder.pkl')
# Streamlit UI
st.title("🌱 Smart Crop Selection Web App")
st.markdown("Enter soil parameters to get the best crop recommendation.")
# Sidebar inputs
st.sidebar.header("Input Soil Parameters")
N = st.sidebar.number_input("Nitrogen (N)", min_value=0.0, max_value=160.0, value=30.0)
P = st.sidebar.number_input("Phosphorus (P)", min_value=0.0, max_value=160.0, value=60.0)
K = st.sidebar.number_input("Potassium (K)", min_value=0.0, max_value=250.0, value=50.0)
pH = st.sidebar.number_input("pH Level", min_value=0.0, max_value=10.0, value=6.0)
# Convert inputs to DataFrame
input_data = pd.DataFrame([[N, P, K, pH]], columns=['N', 'P', 'K', 'ph'])
# Preprocess input
test_input = preprocess_input(input_data)
# Prediction
if st.sidebar.button("Predict Crop"):
predicted_crop_index = crop_model.predict(test_input)[0]
recommended_crop = label_encoder.inverse_transform([predicted_crop_index])[0]
# Calculate accuracy on training data
training_data = pd.read_csv('data/soil_measures.csv')
X_train, y_train, _, _ = preprocess_input(training_data.drop(columns=['crop'])), training_data['crop'], None, None
y_train_encoded = label_encoder.transform(y_train)
y_train_pred = crop_model.predict(X_train)
accuracy = accuracy_score(y_train_encoded, y_train_pred) * 100
# Display results
st.subheader("🌾 Recommended Crop:")
st.write(f"**{recommended_crop}** with an accuracy of **{accuracy:.2f}%**")
🚀 Step 6: Run and Deploy the App
🔹 Run Locally
streamlit run app.py
🔹 Deploy on Streamlit Community Cloud
- Push your code to GitHub.
- Go to Streamlit Cloud.
- Select your repository and set
app.py
as the main file. - Click Deploy.
Live Demo
You can access the deployed version of this app: https://smartcropselection.streamlit.app/
Deployment Code & Model Development Insights
The deployment code of the project is available here.
The data exploratory analysis and model development insights are discussed in detail in my previous post.
✅ Conclusion
With Streamlit and Machine Learning, we have successfully built a Smart Crop Selection App that predicts the best crop based on soil parameters. 🚜🌱
Try deploying your own version and improve it by adding weather data or more soil parameters!